Search CORE

1,044 research outputs found

DnaSP v5: A software for comprehensive analysis of DNA polymorphism data

Author: Excoffier
Hutter
J. Rozas
Nielsen
P. Librado
Rosenberg
Rozas
Scheet
Shendure
Stephens
Stephens
Tajima
Vingron
Wang
Young
Publication venue: 'Oxford University Press (OUP)'
Publication date: 08/04/2014
Field of study

Podeu consultar el programari a: http://hdl.handle.net/2445/53451DnaSP is a software package for a comprehensive analysis of DNA polymorphism data. Version 5 implements a number of new features and analytical methods allowing extensive DNA polymorphism analyses on large datasets. Among other features, the newly implemented methods allow for: (i) analyses on multiple data files; (ii) haplotype phasing; (iii) analyses on insertion/deletion polymorphism data; (iv) visualizing sliding window results integrated with available genome annotations in the UCSC browser

Crossref

Diposit Digital de la Universitat de Barcelona

Disperse—a software system for design of selector probes for exon resequencing applications

Author: Albert
Dahl
H. Ji
J. Stenberg
M. Zhang
Okou
Porreca
Shendure
Sherry
Stenberg
Stenberg
Publication venue: Oxford University Press
Publication date
Field of study

Summary:Selector probes enable the amplification of many selected regions of the genome in multiplex. Disperse is a software pipeline that automates the procedure of designing selector probes for exon resequencing applications

Crossref

PubMed Central

An Overview of the Use of Neural Networks for Data Mining Tasks

Author: Alberts B
Alpaydin E
Ando T
Blake CL
Bramer MA
Castanheira LG
Han J
Lu H
Mitchell M
Ni X
Quinlan RJ
Rumelhart DE
Shafer JC
Shendure J
Simić D
Stahl F
Steinwart I
Surjandari I
Wei JS
Widrow B
Witten IH
Zaslavsky B
Zhang D
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks

Central Archive at the University of Reading

Crossref

Portsmouth University Research Portal (Pure)

Bournemouth University Research Online

Beyond element-wise interactions: identifying complex interactions in biological processes

Author: A Kahvejian
AJ Tate
B Gourévitch
C Granger
C Zou
Christophe Ladroue
CJ Needham
CWJ Granger
H Parkinson
HW Mewes
J Geweke
J Pearl
J Peirce
J Shendure
J Wu
J Yu
JF Geweke
Jianfeng Feng
K Friston
K Sachs
Keith Kendrick
L Royer
M Ding
M Eichler
M Fletcher
MC Teixeira
N Wiener
O David
PT Spellman
R Aebersold
RA Horn
RS Wang
S Guo
S Klamt
S Mukherjee
Shuixia Guo
SM Kosslyn
T Barrett
Vladimir Brezina
Y Chen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 22/09/2009
Field of study

Background: Biological processes typically involve the interactions of a number of elements (genes, cells) acting on each others. Such processes are often modelled as networks whose nodes are the elements in question and edges pairwise relations between them (transcription, inhibition). But more often than not, elements actually work cooperatively or competitively to achieve a task. Or an element can act on the interaction between two others, as in the case of an enzyme controlling a reaction rate. We call “complex” these types of interaction and propose ways to identify them from time-series observations. Methodology: We use Granger Causality, a measure of the interaction between two signals, to characterize the influence of an enzyme on a reaction rate. We extend its traditional formulation to the case of multi-dimensional signals in order to capture group interactions, and not only element interactions. Our method is extensively tested on simulated data and applied to three biological datasets: microarray data of the Saccharomyces cerevisiae yeast, local field potential recordings of two brain areas and a metabolic reaction. Conclusions: Our results demonstrate that complex Granger causality can reveal new types of relation between signals and is particularly suited to biological data. Our approach raises some fundamental issues of the systems biology approach since finding all complex causalities (interactions) is an NP hard problem

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

MagicViewer: integrated solution for next-generation sequencing data visualization and genetic variation detection and annotation

Author: E. Zhu
F. Zhao
H. Hou
H. Teng
J. Wu
L. Zhou
McPherson
Medvedev
Q. Bao
Rozen
Shendure
X. Li
Z. Sun
Publication venue: Oxford University Press
Publication date: 01/07/2010
Field of study

New sequencing technologies, such as Roche 454, ABI SOLiD and Illumina, have been increasingly developed at an astounding pace with the advantages of high throughput, reduced time and cost. To satisfy the impending need for deciphering the large-scale data generated from next-generation sequencing, an integrated software MagicViewer is developed to easily visualize short read mapping, identify and annotate genetic variation based on the reference genome. MagicViewer provides a user-friendly environment in which large-scale short reads can be displayed in a zoomable interface under user-defined color scheme through an operating system-independent manner. Meanwhile, it also holds a versatile computational pipeline for genetic variation detection, filtration, annotation and visualization, providing details of search option, functional classification, subset selection, sequence association and primer design. In conclusion, MagicViewer is a sophisticated assembly visualization and genetic variation annotation tool for next-generation sequencing data, which can be widely used in a variety of sequencing-based researches, including genome re-sequencing and transcriptome studies. MagicViewer is freely available at http://bioinformatics.zj.cn/magicviewer/

Crossref

PubMed Central

Institute of Psychology,Chinese Academy Of Sciences

Institutional Repository of Institute of Psychology, Chinese Academy of Sciences

A Revised Design for Microarray Experiments to Account for Experimental Noise and Uncertainty of Probe Response

Author: A Halperin
A Halperin
A Harrison
A Held
A Pozhitkov
AE Pozhitkov
AE Pozhitkov
Alex E. Pozhitkov
AM Osborn
BE Lang
C Burden
CJ Burden
D Hekstra
D Steger
Diethard Tautz
FF Millenaar
FW Studier
GA Held
H Freundlich
H Yang
J Bryk
J Seo
J Shendure
Jarosław Bryk
JB Fan
Lars Kaderali
N Ono
OV Matveeva
Peter A. Noble
S Li
T Czypionka
T Heim
U Mueckstein
Y Barash
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 26/03/2013
Field of study

Background Although microarrays are analysis tools in biomedical research, they are known to yield noisy output that usually requires experimental confirmation. To tackle this problem, many studies have developed rules for optimizing probe design and devised complex statistical tools to analyze the output. However, less emphasis has been placed on systematically identifying the noise component as part of the experimental procedure. One source of noise is the variance in probe binding, which can be assessed by replicating array probes. The second source is poor probe performance, which can be assessed by calibrating the array based on a dilution series of target molecules. Using model experiments for copy number variation and gene expression measurements, we investigate here a revised design for microarray experiments that addresses both of these sources of variance. Results Two custom arrays were used to evaluate the revised design: one based on 25 mer probes from an Affymetrix design and the other based on 60 mer probes from an Agilent design. To assess experimental variance in probe binding, all probes were replicated ten times. To assess probe performance, the probes were calibrated using a dilution series of target molecules and the signal response was fitted to an adsorption model. We found that significant variance of the signal could be controlled by averaging across probes and removing probes that are nonresponsive or poorly responsive in the calibration experiment. Taking this into account, one can obtain a more reliable signal with the added option of obtaining absolute rather than relative measurements. Conclusion The assessment of technical variance within the experiments, combined with the calibration of probes allows to remove poorly responding probes and yields more reliable signals for the remaining ones. Once an array is properly calibrated, absolute quantification of signals becomes straight forward, alleviating the need for normalization and reference hybridizations

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Electronic Archiving System

MPG.PuRe

University of Huddersfield Repository

Rapid flow-sorting to simultaneously resolve multiplex massively parallel sequencing products

Author: B Steuernagel
D Wang
J Binladen
J Sandberg
J Shendure
M Galan
M Margulies
NJ Lennon
PL Auer
PL Stahl
RA White 3rd
S Lundin
SP Perfetto
Publication venue: Nature Publishing Group
Publication date
Field of study

Sample preparation for Roche/454, ABI/SOLiD and Life Technologies/Ion Torrent sequencing are based on amplification of library fragments on the surface of beads prior to sequencing. Commonly, libraries are barcoded and pooled, to maximise the sequence output of each sequence run. Here, we describe a novel approach for normalization of multiplex next generation sequencing libraries after emulsion PCR. Briefly, amplified libraries carrying unique barcodes are prepared by fluorescent tagging of complementary sequences and then resolved by high-speed flow cytometric sorting of labeled emulsion PCR beads. The protocol is simple and provides an even sequence distribution of multiplex libraries when sequencing the flow-sorted beads. Moreover, since many empty and mixed emulsion PCR beads are removed, the approach gives rise to a substantial increase in sequence quality and mean read length, as compared to that obtained by standard enrichment protocols

Crossref

PubMed Central

Visualizing dimensionality reduction of systems biology data

Author: A Hyvaerinen
A Hyvaerinen
A Inselberg
A Inselberg
A Saeed
Albert Pritzkau
Andreas Lehrmann
Aydin C. Polatkan
DH Jeong
DJ Lockhart
F Battke
F Battke
GH Golub
H Abdi
H Hotelling
HF Kaiser
J Shendure
JB Tenenbaum
K Pearson
Kay Nieselt
KQ Weinberger
LK Saul
M Fontes
M Harrower
M Schena
Michael Huber
P Mannfolk
R Karbauskaite
R Tarjan
S Roweis
Z Zhang
Ö Altug-Teber
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/06/2012
Field of study

One of the challenges in analyzing high-dimensional expression data is the detection of important biological signals. A common approach is to apply a dimension reduction method, such as principal component analysis. Typically, after application of such a method the data is projected and visualized in the new coordinate system, using scatter plots or profile plots. These methods provide good results if the data have certain properties which become visible in the new coordinate system and which were hard to detect in the original coordinate system. Often however, the application of only one method does not suffice to capture all important signals. Therefore several methods addressing different aspects of the data need to be applied. We have developed a framework for linear and non-linear dimension reduction methods within our visual analytics pipeline SpRay. This includes measures that assist the interpretation of the factorization result. Different visualizations of these measures can be combined with functional annotations that support the interpretation of the results. We show an application to high-resolution time series microarray data in the antibiotic-producing organism Streptomyces coelicolor as well as to microarray data measuring expression of cells with normal karyotype and cells with trisomies of human chromosomes 13 and 21

arXiv.org e-Print Archive

Crossref

Publikationsserver der Universität Tübingen

Comparing Platforms for C. elegans Mutant Identification Using High-Throughput Whole-Genome Sequencing

Author: Anne C. Hart
DA Wheeler
H Li
Itsik Pe'er
J Shendure
Oliver Hobert
S Sarin
S Sarin
Sumeet Sarin
T Ley
Ye Liu
Yufeng Shen
Publication venue: Public Library of Science
Publication date: 24/12/2008
Field of study

Whole-genome sequencing represents a promising approach to pinpoint chemically induced mutations in genetic model organisms, thereby short-cutting time-consuming genetic mapping efforts.We compare here the ability of two leading high-throughput platforms for paired-end deep sequencing, SOLiD (ABI) and Genome Analyzer (Illumina; "Solexa"), to achieve the goal of mutant detection. As a test case we used a mutant C. elegans strain that harbors a mutation in the lsy-12 locus which we compare to the reference wild-type genome sequence. We analyzed the accuracy, sensitivity, and depth-coverage characteristics of the two platforms. Both platforms were able to identify the mutation that causes the phenotype of the mutant C. elegans strain, lsy-12. Based on a 4 MB genomic region in which individual variants were validated by Sanger sequencing, we observe tradeoffs between rates of false positives and false negatives when using both platforms under similar coverage and mapping criteria.In conclusion, whole-genome sequencing conducted by either platform is a viable approach for the identification of single-nucleotide variations in the C. elegans genome

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central